On Probability Distributions for Trees: Representations, Inference and Learning
نویسندگان
چکیده
We study probability distributions over free algebras of trees. Probability distributions can be seen as particular (formal power) tree series [BR82; EK03], i.e. mappings from trees to a semiring K. A widely studied class of tree series is the class of rational (or recognizable) tree series which can be defined either in an algebraic way or by means of multiplicity tree automata. We argue that the algebraic representation is very convenient to model probability distributions over a free algebra of trees. First, as in the string case, the algebraic representation allows to design learning algorithms for the whole class of probability distributions defined by rational tree series. Note that learning algorithms for rational tree series correspond to learning algorithms for weighted tree automata where both the structure and the weights are learned. Second, the algebraic representation can be easily extended to deal with unranked trees (like xml trees where a symbol may have an unbounded number of children). Both properties are particularly relevant for applications: nondeterministic automata are required for the inference problem to be relevant (recall that Hidden Markov Models are equivalent to nondeterministic string automata); nowadays applications for Web Information Extraction, Web Services and document processing consider unranked trees.
منابع مشابه
Classification and properties of acyclic discrete phase-type distributions based on geometric and shifted geometric distributions
Acyclic phase-type distributions form a versatile model, serving as approximations to many probability distributions in various circumstances. They exhibit special properties and characteristics that usually make their applications attractive. Compared to acyclic continuous phase-type (ACPH) distributions, acyclic discrete phase-type (ADPH) distributions and their subclasses (ADPH family) have ...
متن کاملLearning Multi - Linear Representations for Efficient
We examine the class of multi-linear representations (MLR) for expressing probability distributions over discrete variables. Recently, MLRs have been considered as intermediate representations that facilitate inference in distributions represented as graphical models. We show that MLR is an expressive representation of discrete distributions and can be used to concisely represent classes of dis...
متن کاملEfficient Bayesian Inference by Factorizing Conditional Probability Distributions
Bayesian inference becomes more efficient when one makes use of the structure that is contained within the conditional probability tables that together constitute a joint probability distribution over a set of discrete random variables. Such structure can be represented in the form of probability trees or Boolean polynomials. However, in order to make use of such representations in Bayesian inf...
متن کاملFitting Tree Height Distributions in Natural Beech Forest Stands of Guilan (Case Study: Masal)
In this research, modeling tree height distributions of beech in natural forests of Masal that is located in Guilan province; was investigated. Inventory was carried out using systematic random sampling with network dimensions of 150×200 m and area sample plot of 0.1 ha. DBH and heights of 630 beech trees in 30 sample plots were measured. Beta, Gamma, Normal, Log-normal and Weibull prob...
متن کاملDenali: A tool for visualizing scalar functions as landscape metaphors
Many sources of data in machine learning, such as probability distributions and cost functions, can be summarized by extracting tree-like structures. These trees, however, are often large enough so as to be difficult to visualize with traditional methods. In this paper, we present denali, an interactive interface with novel features for visualizing landscape metaphor representations of trees, a...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
- CoRR
دوره abs/0807.2983 شماره
صفحات -
تاریخ انتشار 2008